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ElasticWho? 

ElasticSearch is a flexible and powerful open source, distributed 

real-time search and analytics engine. 



Features 

• Real time analytics 

• Distributed 

• High availability 

• Multi tenant architecture 

• Full text 

• Document oriented 

• Schema free 

• RESTfulAPI 

• Per-operation persistence 



Distributed 

Start small and scale horizontally out of the box. For more capacity, 
just add more nodes and let the cluster reorganize itself. 



High Availability 




ElasticSearch clusters detect and remove failed nodes, and 

reorganize themselves. 



Multi Tenancy 



$ curl -XPUT http://local.host :9200/people 

$ curl -XPUT http://local.host :9200/gems 

$ curl -XPUT http://local.host : 9200/gems/document/pry-0 . 5 . 9 

$ curl -XGET http : //localhost : 9200/gems/document/pry-0 . 5 . 9 



A cluster can host multiple indices which can be queried 

independently, or as a group. 



Document Oriented 



"id": "pry-0.5.9", 
"index": "gems", 
"source": { 

"authors": [ 

"John Mair (banisterfiend) " 

], 

"autorequire" : null, 
"bindir": "bin", 
"certchain" : [], 

"date": "Sun Feb 20 11:00:00 UTC 2011", 
"defaultexecutable" : null, 

"description": "attach an irb-like session to any object at runtime", 
"email": "jrmair@gmail.com" 

} 

} 



Store complex real world entities in Elasticsearch as structured JSON 

documents. 



RESTful API 

Almost any operation can be performed using a simple RESTful 

interface using JSON over HTTP. 

• curl -X GET 

• curl -X PUT 

• curl -X POST 

• curl -X DELETE 



Apache Lucene 

ElasticSearch is built on top of Apache Lucene. Lucene is a high 
erformance, full-featured Information Retrieval library, written in 

Java. 



ElasticSearch Terminology 



Document 



$ curl -XGET http://localhost:9200/gems/document/pry-0.5.9 




In ElasticSearch, everything is stored as a Document. Document can 
be addressed and retrieved by querying their attributes. 



Document Types 

Lets us specify document properties, so we can differentiate the 

objects. 



Shard 

Each Shard is a separate native Lucene Index. Lets us overcome RAM 

limitations, hard disk capacity. 



Replica 

An exact copy of primary Shard. Helps in setting up HA, increases 

query throughput. 



Index 

ElasticSearch stores its data in logical Indices. Think of a table, 

collection or a database. 
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An Index has atleast 1 primary Shard, and 0 or more Replicas. 



Cluster 

A collection of cooperating ElasticSearch nodes. Gives better 
availability and performance via Index Shardingand Replicas. 



ElasticSearch Workshop 



Download and start 

Download ElasticSearch from 
http://www.elasticsearch.org/download 



# service elasticsearch start 

# /etc/init . d/elasticsearch start 

# . /bin/elasticsearch -f 



ElasticSearch Plugins 



A site plugin to view contents of ElasticSearch cluster. 



# 


cd /usr/share/elasticsearch 




# 


./bin/plugin -install mobz/elasticsearch- 


head 




# 


cd /opt/elasticsearch-0 . 90 . 2 




# 


./bin/plugin -install mobz/elasticsearch- 


head 



Restart ElasticSearch. Plugins are detected and loaded on service 

startup. 



elasticsearch-head 




ffi ELasticSearch Head 



^- CD LocaLhost9200/_pLugin/head/ 



Eld.StiCS63.rCh http://localhost:9200/ 



Connect 



Enchantress 
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aadhaar 

size: 594b (594b) 
docs: 0 (0) 



gems 

size: 495b (495b) 
docs: 0 (0) 
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RESTful interface 



$ curl -XGET ' http : //localhost : 9200/ 
{ 

"ok" : true, 

"status" : 200, 

"name" : "Drake, Frank", 

"version" : { 

"number" : "0.90.2", 
"snapshotbuild" : false, 
"luceneversion" : "4.3.1" 

}, 

"tagline" : "You Know, for Search" 

} 



Create Index 



$ curl -XPUT 'http://localhost:92O0/gems' 
{ 

"ok" :true, 
"acknowledged" :true 

} 



Cluster status 



$ curl -XGET ' localhost : 9200/status ' 

{"ok":true," shards" : {"total" : 20, "sue ful":10," ailed":0}, 

"indices" : {"gems" : {"index" : {" prima rysize" : "495b" , "primarysizeinbytes" :495, 

"size" : "495b" , "sizeinbytes" :495}, "translog" : {"operations" :0}, 

"docs" :{"num_docs" :0, "maxdoc" :0, "deleteddocs" :0}, "merges" : 

{"current" : 0, "currentdocs" : 0, "currentsize" : "0b" , "currentsizeinbytes" : 0, 

"total" :0, "totaltime" : "0s" , "totaltimeinmillis" : 0, "totaldocs" : 0, 

"totalsize" : "0b" , "totalsizeinbytes" : 0} , 



Pretty Output 



$ curl -XGET ' localhost : 9200/_status?pretty ' 

$ curl -XGET ' localhost : 9200/status ' | python -mj son . tool 

$ curl -XGET 1 localhost : 9200/status ' | j sonreformat 



{ 

"ok": true, 
"shards": { 

"total": 20, 

"successful": 10, 

"failed": 0 

}, 

"indices": { 
"gems": { 

"index": { 

"primarysize" : "495b", 
" p rima rysizeinbytes " : 495 , 
"size": "495b", 
"sizeinbytes" : 495 



Delete Index 



$ 


curl -XDELETE 


' http : //localhost : 9200/gems ' 




{ 








"ok" :true. 






"acknowledged" 


: true 


} 







Create custom Index 



"settings" : { 
"index" : { 

"numberofshards" : 6, 
"numberof replicas" : 0 

} 

} 



$ curl -XPUT 'http://localhost:92O0/gems' -d @body.json 



"ok" :true, 
"acknowledged" :true 



Index a document 



{ 

"name": "pry", 

"platform": "ruby", 

" rubygemsversion" : "1.5.2", 

"description": "attach an irb-like session to any object at runtime", 
"email" : "anurag@example . com" , 
"hasrdoc": true, 

"homepage" : "http : //banisterf iend .wordpress . com" 



$ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json 



"ok" : t rue, 
"index" : "gems" , 
"type" : "test", 
"id" : "lsJgxiwET6eg", 
" version":! 



Get document 



$ curl -XGET ' http : //localhost : 9200/gems/test/lsJgxiwET6eg ' 


python -mj son . tool 





"id": "lsJgxiwET6eg", 
"index": "gems", 
"source": { 

"description": "attach an irb-like session to any object at runtime", 
"email" : "anurag@example . com" , 
"hasrdoc": true, 

"homepage" : "http : //banisterf iend .wordpress . com" , 

"name": "pry", 

"platform": "ruby", 

" rubygemsversion" : "1.5.2" 

}, 

"type": "test", 
"version": 1, 
"exists": true 



Index another document 



{ 

"name": "grit", 

"platform": "jruby", 

" rubygems version" : "2.5.0", 

"description": "Ruby library for extracting information from a git repository.", 
"email" : "moj ombo@github . com" , 
"hasrdoc": false, 

"homepage" : "http : //github . com/mojombo/grit" 



$ curl -XPOST 'http://localhost:92O0/gems/test/' -d @body.json 



"ok" : t rue, 
"index" : "gems" , 
"type" : "test", 
"id" : "ijU0Hi2cQc2", 
" version":! 



Custom Document IDs 



{ 

"name": "grit", 

"platform": "jruby", 

" rubygems version" : "2.5.1", 

"description": "Ruby library for extracting information from a git repository.", 
"email" : "moj ombo@github . com" , 
"hasrdoc": false, 

"homepage" : "http : //github . com/mojombo/grit" 



$ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json 



"ok" : t rue, 
"index" : "gems" , 
"type" : "test", 
"id" : "grit-2.5.1", 
" version":! 



IDs are unique across Index. Composed of DocumentType and ID. 



Document Versions 



$ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json 
{ 

"ok" : true, 
"index" : "gems" , 
"type" : "test", 
"id" : "grit-2.5.1", 
"version" : 2 

} 



Searching Documents 



{ 




"query": { 




"term": {"name": 


"pry"} 


} 




} 





$ curl -XPOST http : //localhost : 9200/gems/search -d @body . j son | python -mjson.tool 



{ 

"shards": { 
"failed": 0, 
"successful": 6, 
"total": 6 

}, 

"hits": { 
"hits": [ 
{ 

"id": "MWkKgzsMRgK" , 
"index": "gems", 
"score": 1.4054651, 
"source": { 

"description": "attach an irb-like session to any object at runtime", 
"email" : "anurag@example . com" , 
"hasrdoc": true, 

"homepage" : "http : //banisterf iend .wordpress . com" , 

"name": "pry", 

"platform": "ruby", 

" rubygemsversion" : "1.5.2" 



Counting Documents 



{ 

"term": {"name": "pry"} 
} 

$ curl -XGET http : //localhost : 9200/gems/test/count -d @body.json 
{ 

"shards": { 

"failed": 0, 
"successful": 6, 
"total": 6 

}, 

"count": 1 

} 



Update a Document 



{ 

"doc": { 
"platform": "macruby" 

} 

} 

$ curl -XPOST http://local.host :9200/gems/test/grit-2.5. l/_update -d @body.json 

{ 

"ok" : t rue, 
"index" : "gems" , 
"type" : "test", 
"id" : "grit-2.5.1", 
"version" :4 

} 



The partial document is merged using simple recursive merge. 



Update via Script 



{ 








"script" : "ctx. 


source . platform = vmname", 




"pa rams" : { 






"vm name" : 


" rubinius" 




} 




} 







$ curl -XPOST http://localhost:920O/gems/test/grit-2.5.1/_update -d @body.json 



{ 

"ok" : t rue, 
" index" : "gems" , 
"type" : "test", 
"id" : "grit-2.5.1", 
"version" : 5 

} 



Delete Document 



$ curl -XDELETE ' http : //localhost : 9200/gems/test/grit-2 . 5 . 1 ' 



"ok" : true, 
"found" : true, 
" index" : "gems" , 
"type" : "test", 
"id" : "grit-2.5.1", 
" version" :6 



Put Mapping 



{ 

"gem" : { 

"properties" : { 

"name" : {"type" : "string", "index": "notanalyzed"} , 

"platform" : {"type" : "string", "index": "notanalyzed"} , 
" rubygemsversion" : {"type" : "string", "index": "notanalyzed"} , 
"description" : {"type" : "string", "store" : "yes"}, 
"hasrdoc" : {"type" : "boolean"} 

} 

} 

} 



$ curl -XGET ' http : //localhost : 9200/gems/mapping 



python -mjson.tool 



Index Document with Mapping 



{ 

"name": "grit", 

"platform": "ruby", 

" rubygems version" : "2.5.1", 

"description": "Ruby library for extracting information from a git repository.", 
"email" : "moj ombo@github . com" , 
"hasrdoc": false, 

"homepage" : "http : //github . com/mojombo/grit" 
} 

$ curl -XPUT ' http://local.host :9200/gems/gem/grit-2.5. 1' -d @body.json 
{ 

"ok" : t rue, 
"index" : "gems" , 
"type" : "gem" , 
"id" : "grit-2.5.1", 
"version" : 1 

} 



Matching documents 



{ 




"query": { 




"match" : { 




"description" 


: "git repository" 


} 




} 




} 





$ curl -XPOST http://localhost : 9200/gems/gem/search -d @body.json 



Highlighting 



"query": { 
"match" : { 

"description" : "git repository" 

} 

}, 

"highlight" : { 

"fields" : { 

"description" : {} 

} 

} 



$ curl -XPOST http : //localhost : 9200/gems/gem/search -d @body.json 



"highlight": { 
"description": [ 

"Ruby library for extracting information from a <em>git</em> <em>repository</em>. " 

] 

} 



Search Facets 



{ 








"query": { "match all" 


{} }, 




"facets" : { 






"gem names" : { 






"terms" : { "field" 


"name" } 




} 






} 




} 







$ curl -XPOST http://localhost : 9200/gems/search -d @body.json 









A, 


"facets" 


: { 






"gemnames" : 


{ 




"type": "terms", 




"missing" : 


0, 




"other": 0, 






"terms": [ 






{ 










'count" 


: 2, 






'term" : 


"pry" 




}, 








{ 










'count" 


: 2, 






'term" : 


"grit" 




}, 








{ 










'count" 


: 1, 






'term" : 


"abc" 




} 








], 

















(Lab) 

Analyzing Aadhaar's Datasets 



Download Public Dataset 

Download from Aadhaar Public Data Portal at 
https://data.uidai.gov.in 



Download Tools 

$ git clone https://github.com/gnurag/aadhaar 



Prepare Data & Configure 



# gem install yajl-ruby tire activesupport 

$ git clone https://github.com/gnurag/aadhaar 
$ cd aadhaar/data 

$ unzip UIDAI-ENR-DETAIL-20121001.zip 

$ cd . ./bin 

$ vi aadhaar . rb 



Configuration 



AADHAAR DATA DIR = 


"/path/to/aadhaar/data" 


ES URL 


"http://localhost:9200" 


ES INDEX 


' aadhaar 1 


ES TYPE 


"UID" 


BATCH SIZE 


1000 



Index 

$ ruby aadhaar.rb 



Running Examples 

$ curl -XPOST http://localhost:9200/aadhaar/UID/_search -d 

@template.json | python -mjson.tool 



Additional Notes 



Index Aliases 

Group multiple Indexes, and query them together. 



curl -XPOST 'http://localhost:9200/_aliases' -d ' 
{ 

"actions" : [ 

{ "add" : { "index" : "indexl", "alias" : "master-alias" } } 
{ "add" : { "index" : "index2", "alias" : "master-alias" } } 

] 

V 

curl -XPOST 'http://localhost:92O0/_aliases' -d 1 
{ 

"actions" : [ 

{ "remove" : { "index" : "index2", "alias" : "master-alias" } } 

] 

}' 



Document Routing 

Control which Shard the document will be placed and queried from. 



Parents & Children 



$ curl -XPUT http : //localhost : 9200/gems/gem/roxml?parent=rexml -d '{ 
"tag" : "something" 

}' 



Custom Analyzers 



Boosting Search Results 



ElasticSearch Ecosystem 

A wide range of site plugins, analyzers, river plugins available from 

the community. 



THE END 

@gnurag / github 



